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Method and arrangement for predicting measurement data 
using given measurement data 

The invention relates to a method and arrangement for 
predicting measurement data using given measurement 
data . 

A technical system often requires facilities for 
forecasting based on known (measurement) data, 
particularly in the context of error susceptibility or 
cost estimates. 

Forecasts generated by experts are generally subject to 
errors. Experts cannot carry out exact analyses, at 
least of highly complex systems. 

A stochastic point process, in particular a Poisson 
process, is described in [1] . 

The object of the invention is to allow the automatic 
prediction (forecast) of measurement data using given 
measurement data . 

This object is achieved in accordance with the features 
of the independent patent claims. Developments of the 
invention are described in the dependent claims. 

In order to achieve the object, a method is proposed 
for predicting measurement data using given measurement 
data, in which a stochastic process is matched to the 
given measurement data. Simulation runs are carried out 
from a given time-point until a final time-point. The 
forecast measurement data is determined for each 
simulation run. Measurement data for the final time- 
point 
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is predicted within a range of values, which is 
governed by the forecast measurement data. 

One development is to define a confidence range for the 
prediction of measurement data, where the a% lowest and 
b% highest forecast measurement data are eliminated. In 
particular, a% can equal b% . For example, a 95% 
confidence range can thus be defined by ignoring the 
2.5% lowest and 2.5% highest forecast measurement data. 

One advantage is that the measurement data can be 
predicted (forecast) with an accuracy that is within a 
confidence range, from a given time-point. This makes 
it possible to identify e.g. the feasibility or 
impossibility of a task associated with the measurement 
data, at an early stage. Appropriate measures can 
therefore be initiated in order to counteract forecast 
impossibility. This is particularly important in the 
case of a complex system, e.g. a software development 
process, where the extent to which a schedule can be 
followed before the software is completed can be shown 
in a subsequent test phase. Even more important in this 
context is the ability to adopt countermeasures at an 
early stage if a delay has been clearly identified, 
e.g. in an integration test phase. This firstly affects 
the feasibility of the specified deadline (timescale) 
and secondly directly affects costs, since non- 
compliance with the agreed timescale often results in 
additional costs. 

One refinement is for the stochastic process to be a 
non- homogeneous Poisson process. 

In particular, the measurement data may in one 
refinement comprise numbers of errors. This applies to 
software development , 
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for example, where the level of maturity is documented 
in accordance with the errors measured in a test phase. 
Completion is directly dependent on this level of 
maturity. In other words, the software cannot be 
delivered to customers until most of the errors have 
been removed from the software. This is particularly 
important with regard to resources (required to test 
and correct errors) and costs (due to delayed 
delivery) . 

In order to achieve the object, a method is also 
proposed for predicting measurement data using given 
measurement data, in which a stochastic process is 
matched to the given measurement data. A range is 
ascertained, by sorting the probability values 
generated by the stochastic process according to size, 
around an expected value. Measurement data is predicted 
on the basis of this range, and in particular the 
probability values within the range. 

One development is for the probability values generated 
by the stochastic process to be sorted symmetrically by 
size around the expected value. In particular, this 
means that the highest probability value represents the 
middle of the range, i.e. the expected value, whereas 
the next highest probability value is arranged to the 
right or left of the expected value. The next highest 
probability value is then arranged symmetrically on the 
other side of the expected value, in turn. 

This analytical (design) procedure provides a range, 
where the breadth of the range in turn indicates which 
probability values are significant in the prediction of 
the measurement data. 

In one particular refinement, the breadth of the range 
is determined by ignoring the 
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probability values that lie below a given threshold. 

This produces a range (confidence range) , which has a 
specific breadth as a result of the threshold. This 
breadth corresponds to the certainty with which the 
measurement data is predicted. 

If one assumes that the stochastic process is a non- 
homogeneous Poisson process, then the non- homogeneous 
Poisson process defines a step size, particularly on a 
time axis t, which indicates when the next error will 
occur. One characteristic of the non- homogeneous 
Poisson process is that it has no memory, so that a 
"no-memory" search is carried out from each error that 
occurs at a specific time-point , for a time-point that 
indicates the next error. 

In order to achieve the object, an arrangement is also 
proposed for predicting measurement data using given 
measurement data, whereby a processor unit is provided 
and configured in such a way that: 

a) a stochastic process can be matched to the given 
measurement data ; 

b) simulation runs can be carried out from a given time- 
point until a final time-point; 

c) the forecast measurement data can be determined for 
each simulation run; 

d) the prediction of measurement data for the final 
time- point can be predicted within a range of values, 
which is determined by the forecast measurement data. 

In order to achieve the object, an arrangement is 
further proposed for predicting measurement data using 
given measurement data, whereby a processor unit is 
provided and configured in such a way that: 
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a) a stochastic process can be matched to the given 
measurement data; 

b) a range can be ascertained by sorting probability- 
values generated by the stochastic process according 

5 to size around an expected value; 

c) the measurement data is predicted within the limits 
of the range . 



The arrangements are particularly suitable for carrying 
10 out the inventive method or the developments described 
O above . 

00 

m Exemplary embodiments of the invention are shown and 

W explained below with reference to the drawings, in 

15 which: 

H Fig. 1 is a graph showing an accumulated number of 
ffi errors over a test period; 

□ 20 Fig. 2 is a graph showing the superimposed confidence 

ranges for different process models; 

Fig. 3 is a flowchart showing the steps in a method 
for predicting measurement data using given 
2 5 measurement data; 

Fig. 4 is a further flowchart showing the steps in a 
method for predicting measurement data using 
given measurement data; 
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Fig. 5 shows a processor unit 
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In order to be able to forecast a number of errors to 
be expected in a technical process, e.g. in a software 
development process, non- homogeneous Poisson 
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processes (NHPP) are calibrated, i.e. matched to 
measurement data, e.g. the occurrence of errors over 
time, as follows: 



The following equation describes a counting process 
associated with the stochastic point process (non- 
homogeneous Poisson process) : 

{N(t)} teR+ (1) 

and a time-point t 0 defines the end of a test period, 
i.e. a time-point at which the given data ends . The 
stochastic processes 

{u(t)} teR + and (2) 

{o(t)} teR + (3) 

are searched with 
p(u(t) < N(t) - N(t 0 ) < 0(t)| N(t 0 ) = n 0 ) > a (4) , 

for all time-points where t>t 0 and given values a e 
(0.1) (confidence level) and n 0 <e N. In particular, the 
following text examines the increases in the stochastic 
countings process in relation to the time-point t 0 . 

In the present case, where equation (1) represents a 
non-homogeneous Poisson process, the following equation 
(cf. [1]) 

K*l) - "(to) ' t) - exp(- [i( tl ) - i(t 0 )]) • f l(tl) ', l(t0) l < (5, 

applies for 

0 £ to < ti < oo, £ e No (6) 

and an intensity (mean measure, mean value function) of 
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fTl 
fft 

n i 



w 



i: R + R + ,t J— > i(t>= EN(t) 



Since the nature of the Poisson process dictates that 
the increases (error increases in this case) are 
independent of previous increases, equation (5) for the 
time-points t>t 0 to define a (minimum) range 



[gu'So] s [guto' goto] c N o 

can be simplified to 
2 p(N(t) - N(t 0 ) = e) > a 



(8) 



(9) . 



Due to the unimodal nature of the Poisson count 
10 density, a range [g u , g 0 ] can be determined as follows: 

Step 1: Sort the elementary probabilities 

P£ : = E»(N(t) - N(t 0 ) = i), t € N 0 

into descending order and label the values 
15 sorted thus using 

P(0)' P(l)' • • • < i - e - fPO' PI' • • •} = (p(0)' P(l)' • • •} an<i 

P(0) * P(l) *•••>'• 



Step 2 : Determine ^mln : = mini 



t e N 0 



Z p(±) * a 

i = 0 



20 



Step 3 : Determine an index set 



:= fco ^min} c N 0 



where 
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PiQ Pi^ 



min 



} - { p <°> P (^in)} ; 



Step 4: Substitute 



:= min{i} and g Q := max{i} . 




The range from equation (8) is also referred to as the 
forecast range . 

Stochastic simulation (second approach) 

It is possible to determine the confidence range 
described using simulation, with the following steps: 

Step 1: Start independent simulation runs based on the 
selected process model at time-point t 0 of the last 
error message m e N; 

Step 2: End a simulation run as soon as the required 
final time-point t e is reached; 

Step 3: Repeat Step 2 until all simulation runs are 
finished; 

Step 4: Sort the numbers Ni(t e ) of the errors generated 
in the i-th simulation run in the time period (t 0 , t e ) , 
i = l, m, in descending order, and label the values 

sorted thus N(i)(t e ), . . . , N( m )(t e ) ; 

Step 5: Substitute 



§u — N( Lm .a/2j)( t e) 



and 
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the (100? (1-a) /2) % lowest and 



5 



10 



20 



25 



This produces the confidence range directly. 

Each individual simulation run is based on a simulation 
algorithm, which is known from (cf. [2]): 

Simulated generation of intermediate arrival times for 
a non- homogeneous Poisson process: 

Step 1: Substitute X := sup{X(t)}, where: 
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t>t. 



X(t):=- 



(10) 



15 Step 2: Generate a (pseudo) random variable X that is 
exponentially distributed with the parameter X, i.e. 
x := - log(u) / X , where U is equally distributed over 

(0,1). 



Step 3: Generate a random variable U that is equally 
distributed over (0,1). 

Step 4: If U < X(t s + x)/X, then substitute t* := t s + X; 
otherwise substitute t. : = t 8 + X and go to Step 1. 

The example graph in Fig. 1 shows an accumulated number 
of errors during a given test period. From time-point 
t 0 , it shows a prediction range for all time-points t 0 + 



x . 



The intensity i is normally derived from equation (10) 
for X. For example the result is as follows: 
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a) X(t) = a • b • c • exp(- bt c ) • t c_1 



(X(t) is strictly monotonously descending for 
c < 1, and unimodal for c > 1 with a definitive 
maximum at a point 



t = J^ll, 



b) Otherwise, k is derived in accordance with the 
above comments as follows: 




10 

The graph in Fig. 2 shows the superimposed confidence 
ranges. In particular, this illustrates that possible 
forecasts become more scattered the further they extend 
into the future. In particular, confidence ranges 
15 calculated using different process models can be 
demonstrated in the same way as shown in Fig. 2. 

Fig. 3 shows a flowchart for the steps of a method for 
predicting measurement data using given measurement 

20 data. In Step 301, a stochastic process, in particular 
a non- homogenous Poisson process (to represent a 
stochastic count process) , is matched to given 
measurement data. In Step 3 02, simulation runs are run 
from time-point t 0 to a final time-point t e that is to 

25 be forecast. In Step 303, for each simulation run, 
forecast measurement data is determined and a 
prediction of measurement data is restricted to a range 
which is covered by the measurement data determined by 
the simulation runs (see Step 304) . In Step 305, a 

30 confidence range is determined in which a given 
proportion of the lowest and highest forecast 
measurement data is ignored in each case (this 
corresponds to the aforementioned range) . The method 
terminates in Step 306. 
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Fig. 4 shows a further flowchart for the steps of a 
method for predicting measurement data using given 
measurement data. In Step 401, a stochastic process, in 
particular a non- homogenous Poisson process, is matched 
to the given measurement data. Probability values are 
determined using the stochastic process, and these are 
sorted according to size around an expected value (see 
Step 402) . This sort operation results in the 
definition of a range, namely a confidence range in 
this case. The breadth of the confidence range is 
determined by comparing the accumulated probabilities 
with a given threshold . As described above , the 
confidence range gives a distribution or uncertainty, 
respectively, of a time-point t 0 in the future, which 
allows the measurement data to be estimated in the 
future (see Step 403) . The method terminates in Step 
404 . 

Fig- .5 shows a processor unit PRZE. The processor unit 
PRZE comprises a processor CPU, a memory unit MEM, and 
an input /output interface IOS, which is used in 
different ways via an interface IFC: a graphics 
interface allows output to be viewed on a monitor MON 
and/or output to a printer PRT. Inputs are entered via 
a mouse MAS or a keyboard TAST. The processor unit PRZE 
also includes a data bus BUS, which provides the 
connection between a memory unit MEM, the processor CPU 
and the input/output interface IOS. It is also possible 
to connect additional components to the data bus BUS, 
e.g. additional memory, data storage (hard disk) or 
scanner . 

The C programming language is used in the following 
examples, which show an algorithm to define confidence 
ranges for forecasts and an algorithm for simulated 
definition of confidence ranges for forecasts. 
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/* Definition of confidence ranges for forecasts */ 
/* based on the generalized Goel-Okomoto model */ 

#include <stdlib.h> 
#include <math.h> 
#include <stdio.h> 

#define true 1 
#define false -1 

double mv_genGO{double,double,double.double); double poisson(double,long); void ki_nhpp() 

int main(argc.argv) 

int argc; 

char *argvO; 

{ 

double a.b.c.btstkn; 
long low.upp.lauf; 

if (argc<7){ 

printf("\n\nZuwenig Argumente! \n\n M ); 

printffAufruf: %s <Par1> <Par2> <Par3> <Startzeit> <Endzeit>", 
"<KNiveau>\n\n\ argv[0]); return 1; 
} 

a = atof(argv[1]); 
b = atof(argv[2]); 
c = atof(argv[3]); 
. bt= atof(argv[4]); 
st= atof(argv[5]); 
kn= atof(argv(6]); 

for (lauf=1;tauf< ;lauf++) { 

kLnhppCmvjenGO.a.b.c.btbt+tauffst-btyiCkn.&low.&upp); 
prinrffZeitpunkt: %8.2f Fehlerintervall: [%d ( %d]\n" ( 
bt+lauf(st-bt)/10., low, upp); 
} 

return 0; 
} 

double mv_genGO(x,a,b,c) 
double x,a,b,c; 

{ return( a*(1.0-exp(-b*pow(x,c))) ); } 

double poisson(lambda,wert) 
double lambda; 
long wert; 
{ 

long i; 

double itval.hv; 

if (lambda<600) { 
itval = exp(-lambda); 

for (i=wert;i>=1;H { itval *= Iambda/(double)i; } 
} 
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else{ 

hv = exp(-lambda/(double)wert); 
itval = 1.0; 

for (i=werti>=1;»-) { itval *= lambda/(double)i*hv; } 
} 

return ( itval ); 
} 

void ki_nhpp(mv_nhpp, par1_nhpp, par2_nhpp, par3_nhpp, 

start_time, stop_time. k_niveau, lower, upper) double mv.jihppfdouble.double.double.double); 
double parl_nhpp, par2_nhpp, par3 nhpp. start time. stop_time. k_niveau; long lower, *upper, 

{ 

long lauf; 

int Iborder, mod J ow,mod_upp; 
double sum,tmp_mv. val_l, val_u; 

tmp_mv = mv_nhpp(stopJime,par1_nhpp,par2_nhpp ( par3_nhpp) - 
mv_nhpp(startjime t par1_nhpp,par2_nhpp,par3_nhpp); lauf = (long )tmp_mv; 
•lower = lauf-1 ; 

•upper = lauf+1; modJow= false; mod_upp= false; sum = poisson(tmp_mv,lauf); valj = 

poissonftmpjnv/lower); val_u = poisson(tmp_mv,*upper); 
while (sum<k_niveau) { 
if (valj<val_u) { 
sum += val_u; 
(*upper)++; 
Iborder = false; 
mod_upp = true; 

val_u = poisson(tmp_mv, "upper); 
} 

else { 

sum += valj; 
(Mower)-; 
Iborder - true; 
modjow = true; 

val I = poisson(tmp_mv,*1ower); 
}" 

} 

if (Iborder == true) { (*lower)++; } 
else { ("upper)-; } 

if (modjow — false) { (*lower)++; } 
if (mod_upp == false) { (*upper)— ; } 



return; 
} 
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/* Simulated definition of confidence ranges for 
forecasts */ 

/* based on the generalized Goel -Okomoto model */ 

include <stdlib.h> 
#in elude <math.h> 
#include <time.h> 
#include <stdio.h> 
#inctude <values.h> 

#define true 1 
#define false -1 

double drand48(void); 
void srand48(long); 

double sim_exp(double); double lambda_genGO(double t double,double,double); void sim_nhpp(); 
int main(argc t argv) 
int argc; 
char *argv[]; 

time_tt; double a l b,c t bt,st,pnt[1000000],checkjime[12]; long lauf,no_pnt.seed_run; int clauf; 
FILE 'datei; 
if (argc<6) { 

printf("\n\nZuwenig Argumente! \n\n"); 

printffAufruf: %s <Par1> <Par2> <Par3> <Startzeit> <Endzeit>\n\n - , 
argv[0]); return 1; 
} 

datei = fopen{"sim.seedYr"); 
if (datei==NULL) { 
seed_run = 1 ; 
} 

else { 

fscanf(datei, ,, %6d , \&seed - run); fclose(datei); seed_run++; 
} 

datei = fopen( M sim.seedYw+"); 
fprintfjdatei, w %6d\n w , seed_run ); 
fclose(datei); 

time ; r tnitialisieaing des */ 

t += seed_run*100 ; /* Zufallszahlengenerators V 

srand48 ((unsigned long) t) ; r mit Hilfe der Systemzeit V 
a = atof(argv[1]); 
b = atof(argv[2]); 
c = atof(argv[3)); 
bt= atof(argv[4]); 
st= atof(argv[5]); 

sim - nhpp(lambda__genGO,a,b,c l bt,st&pnt,&no _pnt); 
for (lauf=1;lauf<=no_pnt;lauf++) { 
printfC*%15.7f %10d \n\ pnt[lauf], lauf); 
} 
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datei = fopen("ki.tmpYa'); 
for (lauf=1;lauf< ;laufM-) { 
check_time[laufl = bt+lauf*(st-bt)/10\; 

check_trme[1 1J ' pntJno_pnt)+1; r groOer als die groGte 
simulierte Zert */ 
clauf = 1; 

for(lauf=1;lauf<=no_pntlauf++){ . 
white (pnttlauf]>=check_time[daufD { fprintf(datei, "%8.2f %6d check_time[ciauf], lauf-1); clauf~ 

} 

} 

if (pnt{no_pnt] <check_time[1 0]) { 

for (lauf=clauf;lauf< ;lauf++) { fprintf(datei, "%8.2f %6d ", checkJime(lauT], no_pnt); 
} 

} 

fprintf(datei, •An - ); 
fclose(datei); 

return 0; 
} 

double sim_exp(lambda) 
double lambda; 

{ return( -iog(drand48())/iambda ); } 

double lambda_jgenGO<x,a,b,c) 
double x.a.b.c; 

{ retum( a*b*c*pow(x.c-1)*exp(-b*pow(x,c)) ); } 

void sim_nhpp(lambda_nhpp, part_nhpp, par2_nhpp, par3_nhpp, 

start_time. stop_time, path, no_points) double lambda_nhpp(double,double,double,double); double 
par1_nhpp, par2 nhpp, par3 nhpp, startjime, stopjime; double pathQ; long *no_points; 

{ 

double sim_time t x ( u,x_bar,lambda_bar; 

*no_points=0; 

sim_time = start_time; 

do{ 

if (par3 nhpp<=1) { lambda_bar = lambda.nhpptsim.time.pan.nhpp.pa^nhpp.para.nhpp); 
} 

else{ 

x_bar = powftpara^nhpp-I.OJ/pa^.nhpp/para.nhpp.l.O/para.nhpp); 
if (sim_time>=x_bar) { 

lambda bar = lambda_nhpp<sim time t par1_nhpp.par2_nhpp,par3_nhpp); 
} 

else{ 

lambda_bar = lambda nhpp(x bar, pari _nhpp,par2_nhpp l par3_nhpp); 
} 

} 

x = sim_exp(lambda_bar); 
u = drand48(); 
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if (u<=lambda_nhpp(simjime+x 1 pa^ { (*no_points>- 

path[*no_points)=sim_time+x; 

} 

sim_time+=x; 
} 

while (sim_time<=stop_tirne); 

return; 

} 
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/* Definition of confidence ranges from the simulation 
data */ 

/* (the simulation data is sorted into ascending order) 
*/ 

include <stdlib.h> 
#indude <math.h> 
#indude <stdio.h> 

int qsort_icmp(int*,int*); 
int qsorHcmp(x,y) 
int *x, *y; 
{ 

if (*x<*y) { return ( -1 ); } 
else if (*x==*y) { return ( 0 ); } 
else { return ( 1 ); } 

} 

int main(argc t argv) 
int argc; 
char *argv[]; 
{ 

int pnt[11 1[1 00000]; 
int qs( 100 000]; 
char *dname; 
int fraci; 

long lauf,lower_bound,upperj50und; 
long l,no_pnt,seed_run; 
double ctime[11],x; 
FILE -datei; 



if (argc<3) { 

printf("\n\nZuwenig Argumente! \n\n"); printff Aufruf: %s <Dateiname> <Konfidenzniveau (in 
%%)>\n\n", argv[0]); return 1; 

} 

dname = argv[1]; 

frac = 100-atoi(argv[2]); 

lauf = 0; 

datei = fopen(dname,"r); 
if (datei==NULL){ return 1;} 
else { 

while (Ifeof(datei)) { 
lauf++; 

for (i=1;i<=9;i++){ 

fscanf(datei f -%8lf %6d°, &ctime[i], &pntli][lauf]); 
} 

fscanf(datei/*%8lf %6d \n", &ctime[10], &pnt[10][lauf]); 
} 

fclose(datei); 
} 

lower_bound = (long)floor(lauTfrac/200.); upper_bound = (long)ceil(laur(200.-frac)/200.); 
if (lower_bound<1) {1ower_bound =1;} 
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printf("\n\n%2d%%-Sicherheitsbereich bei %d SimulationslaufenVnXn", 
100-frac,lauf); 
for (i=1;i< ;i++){ 
for (l=1;K=lauf;l++){ 
qs[l] = pnt[il[l]; 
} 

qsort(&qs[1], lauf, sizeof(int), &qsort_icmp); 
printffZeitpunkt %8.2f Fehlerintervall: [%d,%d]\n", 
ctime[i], qs[lower_bound], qs[upper_bound]); 

} 



return 0 ; 
} 
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